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Abstract. Moment-closure methods are popular tools to simplify the mathematical 
^ ' analysis of stochastic models defined on networks, in which high dimensional joint 

I . distributions are approximated (often by some heuristic argument) as functions of 

^ I lower dimensional distributions. Whilst undoubtedly useful, several such methods 

' suffer from issues of non-uniqueness and inconsistency. These problems are solved by 

an approach based on the maximisation of entropy, which is motivated, derived and 
implemented in this article. A series of numerical experiments are also presented, 
detailing the application of the method to the Susceptible-Infective-Recovered model 
of epidemics, as well as cautionary examples showing the sensitivity of moment-closure 
i-J^ ' techniques in general. 
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1. Introduction 

A great many problems of interest in statistical physics and applied mathematics may 
be modelled as a system of stochastic variables whose interactions form a network. The 
earliest such models were inspired by questions in chemistry and typically imposed a 
lattice interaction structure (see |ij and references therein for many examples), though 
as the applications of these models have diversified to include everything from finance [2] 
to ecology [3], the range of network topologies under consideration has expanded also. 
Whilst of great benefit to the real-world relevance of networked models, the complexity 
of the network structures now in vogue causes serious comphcations to any attempted 
mathematical analysis. 

In the absence of a network structure, the behaviour of well-mixed stochastic 
models may often be very closely described by systems of differential equations, which 
have proven to be indispensable tools in their analysis. It is a long-standing problem 
to reproduce this success in the field of networked models. The primary obstacle 
to this endeavor is the existence of correlations between spatially separated parts of 
the networked model. In mathematical terms, one finds that differential equations 
describing the joint distribution of some number of variables will typically depend on 



Maximum- entropy moment- closure for stochastic systems on networks 



2 



joint distributions of higher dimension, leading to a never ending expansion in the 
number of equations required 

For as long as this problem has been recognised, there have been approximation 
schemes suggested to deal with it. Often trading under the name 'moment-closure' 
these techniques amount to choosing a method by which to guess at higher dimensional 
distributions, based on knowledge of the lower dimensional distributions for which one 
wishes to write differential equations. 

The simplest non-trivial example of a moment-closure scheme is the pair 
approximation, proposed for ecological applications in [U [TJ [8] , but already with a long 
history in statistical physics (see, for example P [THl [H])- In this approximation, the 
state of the whole system is reduced to the joint probability of pairs of variables which 
neighbour each other in the network. The joint distribution of three connected variables 
is then approximated by one of two particular functions, depending on whether they 
form a triangle or a line in the network. This approximation has been widely applied 
and is successful for a reasonably broad range of models and parameters [31 HSj [131 E] i 
particularly in those of adaptive networks [151 HSl HZl HI]. There is, however, a 
mathematical problem with the pair approximation as it is usually applied to triangles. 
The formula given in [8l[T9] for the joint distribution of three variables forming a triangle 
is in general not a probability distribution at all (it is not normalised) and is typically 
inconsistent with the known marginal distribution of pairs. We will address this issue 
in detail in Sections 2.4 and 3.2. 

Moving beyond the pair approximation, more complex schemes have been proposed 
based on joint probabilities of three or more variables; examples include those of 
references [14l[20l[21]. The moment-closure methods employed in most such works are 
reminiscent of Bayes nets [22], being based on the assumption of some form of conditional 
independence, allowing the factorisation of high dimensional joint probabilities into 
smaller objects. Therein lies a problem of non-uniqueness: depending on the local 
configuration of the network, there may be several such factorisations to choose from 
and no logical motivation for making one choice over another. For a discussion of this 
problem in relation to an approximation based on triples, see |14] . 

This article will develop a moment- closure approach based on the maximisation 
of entropy. Put simply, when the need arises to approximate a high dimensional 
distribution based on knowledge of some lower dimensional marginals, it is wise to 
avoid introducing any additional, unmotivated, bias; this is achieved by choosing the 
distribution of maximum entropy subject to the known constraints. The approach has 
the distinct advantages of providing a unique and consistent approximation scheme for 
distributions of any dimension, and numerical experiments demonstrate considerable 
success in certain models. There is, however, the aesthetic downside that often 
the resulting approximation cannot be written as an explicit function of the lower 
dimensional distributions. In these circumstances, one must deal with it either implicitly 



I Alternatively, consideration of the time evolution of macroscopic observables in disordered systems 
on networks leads to a subtly different closure problem which has been addressed in, e.g. [11 [5]. 
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or numerically; some appropriate methods are outlined in Section 2.3. It should be noted 
that the idea of applying maximum-entropy to obtain moment-closure approximations 
is not unique to this article, having been discussed recently for continuous space models 
in physics [23] and ecology [21] . The emphasis here is on the application of the method 
to networked models, and the relationship to existing alternative schemes. 

To connect with other approaches, the maximum-entropy formalism is applied to 
the pair approximation in Section 2.4. In dealing with the case of three sites in a row 
the pair and maximum-entropy approximations agree, whilst for triangles we are able 
to suggest an alternative to the inconsistent choice usually made. The performance of 
this proposed alternative approximation is checked numerically in Section 3.2, with 
the surprising result that, whilst it is considerably better at predicting the joint 
distribution on triangles, the output from numerical integration of the moment-closed 
differential equations is notably worse. Further numerical investigation reveals that this 
phenomenon can be attributed to a bizarrely fortunate cancellation of error which occurs 
within the usual pair approximation when applied to networks with many triangles. 

To bring to a close the numerical investigations of the article, an experiment 
is proposed to test the robustness of moment- closure methods in general. The 
basic philosophy of moment- closure rests on the assumption that having a good 
approximation to the required higher dimensional joint distributions will endow the 
moment-closed system of differential equations with high predictive power. Despite its 
central importance to the method, the nature of the relationship between accuracy of 
approximation and predictive success has not been well studied. In Section 3.3 the 
results of a simple experiment are presented in which higher dimensional distributions 
are not approximated but rather measured from a simulation. It is shown that the 
application of a small systematic error in one direction or another induces a rather large 
discrepancy in the output of numerical integration of the moment-closed differential 
equations. This result suggests that the quality of predictions made by moment-closure 
approximations may be considerably more fluid than one might hope. 

2. Maximum-entropy approximation 

2.1. Description of the problem 

We consider a discrete time stochastic process [§| defined on a simple graph Q = {V,E). 
For notational simplicity, we label the sites by natural numbers: V = {1, . . . , A^}, where 
is the size of the graph. If {i,j) E E we say that i and j are neighbours in network, 
and we write di for the set of all neighbours of i. Upon each site i a time dependent 
variable Xi is placed, taking values in a finite set of states. In each timestep, a site i is 
chosen at random and the state of the variable Xj is updated. The new value is taken 
to depend only on the old value and those of its neighbours. 

§ No generality is lost here by not discussing continuous time processes as both formulations have the 
same first order behaviour under system size expansion. 
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A great number of processes of interest in statistical physics may be modelled 
according to this broad description. To fix ideas, we will refer throughout this article 
to the paradigmatic example of the Susceptible-Infected- Recovered (SIR) model of 
epidemics. Some number N of individuals are arranged in a network, with the presence 
of an edge indicating the possibility of infectious contact. The possible states of an 
individual are denoted S, I and R, for susceptible, infected and recovered, respectively. 
Per timestep a site i is chosen at random and the variable Xj updated according to: if 
Xi = I then we set Xi = R with probability 7; if Xj = i? then we set Xi = S with 
probability a; ii Xi = S then a neighbour j G di is chosen at random and if Xj = I then 
we set Xi = I with probability f3. The last reaction of this list models the transmission 
of the disease, which is restricted to pairs of neighbouring sites and thus the spread of 
the disease may be strongly dependent on the geometry of the network. 

Rather than always working with the detailed information of the exact state of 
each variable in a stochastic process, it is usual to consider macroscopic observables - 
statistical measures which summarise some aspect of the state of the system at a given 
time. The simplest example is the distribution of states of a randomly chosen site, we 
write 



I"! ... 

where I is the indicator function. Objects of this type are very useful in giving an 
overview of the behaviour of the system, for example, the progress of an SIR epidemic 
may be monitored by plotting the time evolution of P{I), that is, the fraction of the 
population currently infected. 

To gain more detailed information about the underlying process, the above 
construction may be generalised to arbitrary arrangements of variables in the network. 
Specifically, for a given (assumed small) graph g, let Pg{x) denote the fraction of induced 
subgraphs h G Q which are isomorphic to g and whose site variables take exactly the 
states X = {xi,X2, . . .) ■ Formally, we write 



•{''-sjllH^'' 

l{h = g] 



P,{x) = '-^ , , (1) 

hcg 

where = denotes graph isomorphism. A simple example is given by the joint distribution 
of pairs of neighbouring variables, setting g to be the graph of two connected sites, we 
may define 

P(x,y):=P,ix,y) = j^ J2 l{x, = x, X, = ?/} . 

We will refer to distributions of the type ([T]), which are measured directly from an 
underlying stochastic process, as empirical. Note that empirical distributions will change 
with time, though we have not made this explicit in the notation. 
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The usual methodology of analyses based on objects of the type Pg{x) is to 
move from regarding them as measurements of a stochastic system to treating them 
as dynamical variables in their own right. This is achieved by rescaling time by a 
factor of A^~^ and taking the limit N ^ oo (the limit of large networks), in which 
it is assumed that Pg{x) will approach a smooth non-random function, satisfying a 
differential equation which can be derived from the details of the underlying stochastic 
process. The complicating factor in this approach when applied to networked stochastic 
systems is that generally the rate of change of Pg{x) will depend not only on those 
variables which contribute to Pg{x), but also on their neighbours, leading to a system 
which is not closed. There is more than one way to perform this analysis and different 
sets of differential equations may be found (for example, compare the approaches to SIR 
epidemics taken in references |14] and |25]), however, a relatively general situation is 
described by a collection of differential equations of the form 



where G is the graph containing those sites in g and their neighbours, and the functions 
T{Xg, y) describe probability flows between states, derived from the microscopic 
dynamics. In this form, such equations are of very little use as they say that the rate 
of change of Pg depends on the higher dimensional distribution Pq-, which is not part 
of the system. Moment- closure techniques seek to address this problem by suggesting 
possible ways to express Pq approximately as a function of Pg. In the next subsection, 
we will derive the maximum-entropy approach to moment-closure. 

2.2. Derivation 

Consider a graph G, covered by a collection G of subgraphs. To each site i G G a 
random variable Xi is attached: write X for the vector containing these variables, and 
Xg for the projection of that vector onto the set of sites also included in the subgraph 
(7 G G. The Xi are assumed to have a joint distribution function Pg which is unknown. 
Suppose that we have knowledge of the marginal distribution on each of the subgraphs, 
given by 



In the context of moment- closure, the subgraph distributions Pg are the level at which 
we wish to close our system of differential equations and G is the larger graph containing 
all variables which influence the time evolution of a particular Pg. For a simple example, 
the pair approximation corresponds to the situation in which G is a single site plus its 
neighbours, and G is the collection of edges in G. 

The core of the problem is as follows: to close our system of differential equations 
at the level of the Pg, it is necessary to approximate the unknown higher-dimensional 
distribution P^, guided only by knowledge of the marginal distributions Pg. 




(2) 



y 



PgiXg) Xg = Xg 



Xi -.i^g 
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It is correct to demand that the function Pq approximating Pq must of course be 
a probabihty distribution itself, and that its marginals Pg should agree with the known 
distributions Pg, that is: 

for all X, and for every g ^ G, we have Pci^) = Pgi^g) ■ (3) 

Aside from these requirements, in a general setting there is no a priori reason to assume 
that Pg should take any particular form. It can therefore be argued [26] that the most 
sensible choice for Pq is the distribution which has maximum entropy subject to the 
above constraint. Put simply, to make any other choice for Pq would introduce an 
additional bias which would require further justification. 

Accepting this reasoning, the calculation is straightforward. It will suffice to satisfy 
(|3]), as the requirement that Pq is a bona fide distribution will then hold automatically. 
The entropy of a distribution P is given by 

E{P) = -J2p{x) \ogP{x). 

X 

To perform the constrained maximisation of this quantity, we introduce one Lagrange 
multiplier Xg{Xg) per subgraph g E G and projected state vector Xg. We thus seek 
extrema of the functional 

X g Xg y Xi-.ifg 

Differentiating, we find 

dK{P] 



-l-logP(a^)-5^A 



g\Xg) ) 



dP{x) 

y 

and thus the maximum entropy approximation for Pq is given by 



Pg{x) = exp |-1 - E ^9(^9) I 



We see that Pq factorises over subgraphs; there exist functions Qg such that 

PGix) = llQgiXg), (4) 

g 

where the equations for Lagrange multipliers imply that the functions Qg collectively 
satisfy 

for all g and Xg, Pg{xg) = '^6{xg- Ug) J]^ Qh{yh) ■ (5) 

y h 

Note that the normalisation of the functions Pg{Xg) was not accounted for during the 
implementation of the Lagrange multipliers, meaning that there are several redundant 
degrees of freedom in the solution set to ([5]) which should be treated with care. Indeed, 
it is straightforward to see that (jl]) is invariant under certain multiplicative transforms 
of the Qg, however, this is only a problem if one intends to work directly with the Qg 
in a numerical context. 
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2.3. Practical implementation 

Although it is known ^26j that (provided the supphed marginals Pg are consistent) 
the maximum entropy distribution Pq always exists and is unique, only in certain 
circumstances can it be written neatly as a function of the marginals. We can 
characterise these situations by considering the graph with sites given by the elements 
of G (that is, the subgraphs of G for which marginals are supplied), and edge set given 
by the rule that (7 ~ /i if and only if g and h share at least one site. If "H is a tree, then 

Po(-) = n ;n ) ' 

g V \-\.h^9nh{Xgnh) 

where Pgnh is the marginal distribution of variables contained in both g and h, and we 
set P$ = I for consistency. 

For a general setting in which is not a tree, there is no algebraic expression for 
Pg in terms of the Pg. Numerical solution of the problem is straightforward however, 
for example by a simple process of iterative replacement we describe now. Starting from 
a uniform distribution p(°) , one repeatedly cycles through the subgraphs g E G, at each 
step setting 

p'^'^Hx) 



p(n+l)/^X _ p 



X] 



As n — )■ 00, the distributions P^"-* converge to Pq- Proof of the convergence of this 
procedure may be found in [27], where it is employed in the entirely different context of 
quantifying distributional complexity by analysis of interaction structure. 

The final consideration in the practical application of maximum-entropy moment- 
closure is the inclusion of the approximation in the differential equations it is intended 
to close. As discussed in the introduction, for a given microscopic dynamics there often 
exists more than one set of ODEs purporting to approximate the time evolution of the 
system. We consider the case that one wishes to close a system of differential equations 
of the form ([2]) which we assume are derived from some underlying stochastic process. 
In the fortunate situation that the graph "H of subgraph overlaps is a tree, one may 
simply apply equation ([6]) to close the system by setting 

y 

Away from this special case one only has access to Pq via numerical methods, and no 
such simple expression is possible. However, if, as is very often the case, one intends 
to treat the differential equations numerically, then the lack of a simple closure formula 
is not much of a penalty. At each step of the numerical solution of the differential 
equations, the numerical value of Pg may be recovered efficiently via the iterative 
replacement scheme described above. 
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Alternatively, the equations can always be closed at the level of the functions Qg, 
with the resulting system given by, for all g and Xg, 



T{xg,y) - 6{xg - yg)J2^^osQhiyh) 



0. 



(8) 



As a result of the under-specification of the Lagrange multipliers discussed earlier, there 
are some extraneous degrees of freedom in this system of differential equations, though 
these can be accounted for either by holding constant a suitable number of values Qg{Xg), 
or by imposing additional normalisation constraints on the Qg to prevent divergence. 

To summarise, in the general case for which equation ([7]) does not apply, one has 
the choice of working with Pq numerically, or directly with the Qg via ([8]). 



2.4- Example - pair mean field 

Consider the SIR model on a regular graph of degree k. In [25], the progression of an 
SIR epidemic was modelled by differential equations equivalent to 

±P(S,S) = -2^P{S,S,I) 

jP{S, 1) = ^ {P{S, S, I) - P{I, S, I) - P{S, /)) 

j^P{S, R) = -^P{R, S, I) + ^P{S, I) (9) 
|p(/, I) = 2^ (P(/, S, I) + P{S, /)) - 2 7P(/, /) 

|P(/, R) = ^PiR, ^, J) + 7 /) - R)) . 

In these expressions k denotes the average degree (number of neighbours) of the sites in 
the graph, P{x, y) is probability distribution of a pair of adjacent variables, and 

P(x, y, z) = il- 4>)Pr{x, y, z) + 0PT(a;, y, z) , (10) 

where Pr{x, y, z) is the distribution of three sites in a row, Pt{x, y, z) is the distribution 
of three sites forming a triangle, and is a measure of the frequency of triangles in 
the underlying graph. To close these equations at the level of P{x,y), we are required 
to find expressions for Pji{x,y,z) and PT{x,y,z). Note that although these equations 
involve distributions of dimensions two and three and thus are not of the same form as 
([2]), maximum-entropy moment-closure may still be applied. 

Considering Pr(x, j/, z) first, we are interested in the situation illustrated in Figure 
[1]- we wish to compute the maximum-entropy approximation Pr, under the constraint 
that each of the pairs defined by subgraphs a and b has the known marginal distribution 
P{x,y). From equation (jlj), we see that there exist functions Qa and Qb such that 

Pnix, y, z) = Qaix, y)Qbiy, z) . 
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Figure 1: The graph R consisting of three sites arranged in a row, covered by subgraphs 
a and b given by hnked pairs of sites. 




Figure 2: The graph T consisting of three sites arranged in a triangle, covered by 
subgraphs a, b and c, given by hnked pairs of sites. 



The constraints ([5]) then give first for subgraph a, 

P(x, y) = Paix, y) = ^ Qaix, y)Qbiy, z) = Qaix, y) ^ Qk{y, z) 

z z 

and then for subgraph b, 

P{y, z) = Pb{y, z) = Y] Qa{x, y)Qb{y, z) = Qb{y, z) V ^^^'/^^ . 

Qb{y,z) = ^^^Y.Qb{y,z). 

Combining the two yields 

PR{x,y,z) = — . 11 

Piy) 

Alternatively, we might observe that there are only two subgraphs and we may then 
apply the simple form directly to obtain the same result. 

Equation ( ITT]) is precisely the usual pair approximation to the joint distribution 
of three variables connected in a row, so we see that the maximum-entropy approach 
agrees with the usual methods in this case. Turning attention now to the case of the 
triangle (illustrated in Figure [2]), we find that the situation is not so straightforward. 
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The graph of subgraph overlaps in this case is not a tree and thus there is no simple 
algebraic form for Pt- In [191 E] an approximation is suggested equivalent to 



This is expression is sometimes [191 [2H] attributed to Kirkwood [29]. Unfortunately, 
for almost all Pt this function is not a distribution itself and does not agree with the 
known marginals on pairs. In defense of this expression, it is usually clearly advertised 
as an approximation made for convenience, and besides we will see later that it holds 
up remarkably well when used to numerically integrate equations (Q, despite being 
inaccurate in general. Note that f lT2|) is precisely the result one would obtain from 
naively (incorrectly) applying ([6]) in this situation. 

For numerical applications, the maximum-entropy approximation to Pt may easily 
be computed using the iterative replacement algorithm outlined earlier. Such an 
approach may however be unappealing to those who would like a neat closed system of 
ODEs, whose terms do not depend on quantities which must be computed numerically. 
In that case, one might choose to halt the iterative replacement process after some 
number of steps. For example, three steps will yield the approximate form 



which is a viable alternative to (fT2|) with the advantages firstly of being a genuine 
probability distribution and secondly of agreeing with the marginal distribution on the 
pair {x,y). As will be reported in the next section, numerical experiments comparing 
the approximations f[T2l) and f[T3l) have surprising results. 

3. Numerical experiments 

3.1. An epidemic on the square lattice 

As a first test of the utility of the maximum-entropy approach, we attempt to predict the 
behaviour of the SIR model on a square lattice by numerically integrating a moment- 
closed system of equations. The pair approximation on a square lattice has already 
been considered in [30], so we will develop a higher order theory based on clusters of 
five connected sites. 

Let g be the graph of a single site and its neighbours in the square lattice - see 
Figure [3a| for an illustration. To close the equations for SIR epidemics at the level of 
Pg will require approximation of the distribution Pq on the graph G of all sites within 
distance two of a central site (Figure l3b|) . This bigger graph contains five copies of g 
(Figure [3cl) . which are used as the basis of the maximum-entropy approximation. 




(12) 




(13) 
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(a) The graph g (b) The graph G oi g and its (c) The collection G of five 

neighbours in the lattice copies of g which cover G 

Figure 3: Developing a theory for the SIR model on a square lattice which is closed at 
the level of Pg (a site and its neighbours) requires estimating the distribution of variables 
in Pg, which itself contains five copies of g. 

As per the earlier discussion, one formulation for the moment-closed system of 
differential equations is given by 

j^Pgixg) = J2PGiy)rixg,y), (14) 

y 

however, to compute the coefficients r{Xg,y) one must enumerate the possible changes 
to each /i G G which could occur given the configuration y of the sites in G. Explicitly, 

T{xg, y) = Y^ ^ Xg\y)- I^^Xg = y (^1 - F{xh ^ Xh \ ?/)) , 

where i— )■ is used as a shorthand for a change taking place in one timestep. 

Note that for SIR dynamics there are three possible states per site which, given 
that there are five sites in g and thirteen in G, means that the system ( 1141] contains 
243 equations and moreover each one has 1594323 terms in its sum. The solving of this 
system is most certainly a job for the computer - see [21] for a discussion of this problem 
in relation to the pair approximation. 

Instead of attempting to list the equations themselves, a better understanding can 
be gained through an examination of the steps involved in numerically integrating them. 
Starting from an initial condition in which each state of each site is chosen independently 
at random, 

Pg{Xg) = l[P{Xi), 

we apply the Euler forward method of numerical integration. At each step, time is 
moved on by a small amount t i— t + At, and the following computation is performed: 

(i) Initially, for each Xg, set 
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Figure 4: Comparison of simulation results (circles) of SIR dynamics on a square lattice 
with predictions of the pair approximation (dashed line) and the five site maximum- 
entropy approximation (solid line) described in the text. The diesease parameters are 
f3 = 0.6, 7 = 0.1 and a = 0. 



(ii) For every configuration y of variables in G 

(a) Compute the maximum-entropy approximation Pciv) by iterative replacement 

(b) Compute the probability 5 of a change to the state of the central variable in 
this configuration, write y' for the resulting configuration 

(c) For each h E G put 

f/M ^ j^P.iVH) - SPciy) 

and 

(in) For each Xg, update 

The results of this process for two different parameter choices are presented in Figures 
m and [5l In each case a single random simulation of a discrete time SIR epidemic on 
a toroidal square lattice of size N = 10^ was performed according to the microscopic 
dynamics outlined in Section 2.1. 

For Figure H] the parameters for infection, recovery and return to susceptibility 
were chosen to be /3 = 0.6, 7 = 0.1 and a = 0, respectively. Alongside the result 
from the simulation are those obtained by the numerical integration scheme for the 
cluster approximation described above, as well as the pair approximation of [30] for 
comparison. As is clearly visible in the figure, the cluster approximation provides a 
considerable improvement to the accuracy of the prediction. 
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Figure 5: Comparison of simulation results (circles) of SIR dynamics on a square lattice 
with predictions of the pair approximation (dashed line) and the five site maximum- 
entropy approximation (solid line) described in the text. The diesease parameters are 
f3 = 0.6, 7 = 0.1 and a = 0.01. 

An identical set-up was used for Figure O with the exception that return to 
susceptibility was allowed with probability a = 0.01. This change has the effect of 
moving the system into an oscillatory regime in which both the pair approximation and 
the cluster approximation are dramatically less accurate. The failure of moment-closure 
methods in this regime can be attributed to the development of long range correlations in 
the model which appear to play a central role in controlling the frequency of oscillations 
observed. It is worth noting, though, that for many practical applications the goal 
of moment-closure methods is to quantify the steady states of the system (and their 
stability), rather than to accurately reproduce transient features such as the decaying 
oscillations observed here. In this regard, the cluster approximation is again a clear 
improvement over the pair approximation. 

3.2. Comparison of pair approximations 

In the previous section, we saw that the maximum-entropy approximation for three 
sites in a row agreed with the usual pair approximation for such a configuration. For 
a triangular arrangement of sites, however, the situation is rather more complicated. 
The commonly cited approximation F, given in equation ( |T2|) does not satisfy the 
requirement of being a probability distribution. An alternative expression Pt was 
suggested here in equation (|T3l) . found as the partial implementation of maximum- 
entropy, and may possibly offer an improvement. 

The following pair of numerical experiments are designed to give a casual assessment 
of the performance of these approximations. Firstly, the outputs are compared directly 
to the empirical distribution they are intended to approximate, secondly, the expressions 
F and Pt are employed in the numerical integration of the differential equations to 
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see how success or failure in the first test may affect the predictive power of the system 
they close. 

The approximations in question deal with triangles, so a network with plenty of 
these will provide an appropriate setting for the numerical tests. We choose a small- 
world network constructed as follows: N sites are arranged in a circle and each is 
connected to its four nearest neighbours, each edge then has one end randomly rewired 
with probability 6 (= 0.1 for the present case). Random graphs of this type have 
frequently been used as models of social networks and thus the spread of disease on 
such networks is interesting in its own right. For the simulations used here a single 
graph of this type of size = 1000 was generated in this way. The frequency of 
triangles in this graph was measured at = 0.355. A sample of 1000 independent runs 
of SIR dynamics were then performed on this graph, using an initial condition of 20 
infective sites chosen at random, and parameters for infection, recovery and return to 
susceptibility given by /3 = 0.8, 7 = 0.2 and a = 0, respectively. 

For the first test, at each tenth of a timestep (this is possible since time is rescaled 
by a factor if A^^^ in the microscopic dynamics) the empirical distribution PT{x,y,z) 
of three sites arranged in a triangle was computed using equation ([T]). The empirical 
marginal distribution P{x,y) of pairs of edges which are involved in a triangle was 
computed by summing Px{x,y,z) over one argument. 

From equations (E]) and (fTOll . one sees that the quantities in need of approximation 
are Pt{S,S,I), Pt{I,S,I) and Pt{R,S,I), which between them describe the 
distribution of the state of the third site in a triangle in which the others form a 
susceptible-infected pair. The shaded areas in the top plot of Figure |6a] shows the 
empirical conditional density Pt( x \S, I) = Pt{x, S, I)/P{S, I) for x = S,I, R, obtained 
from the simulation. Since they form a probability distribution, they necessarily 
sum to one. For the central plot of Figure |6al the approximate conditional density 
Pt{x \S, I)/ P{S, I) at each data point was computed using equation ( lT3l) . with the 
empirical pair distribution P{x,y) taken as the input. Finally, the lower plot of the 
same figure shows the result F of equation ( IT2l) . computed in the same way. 

The difference in the accuracy of the approximations F and Pt shown in Figure [6a] 
is striking. Whilst Pt appears to give a reasonably close approximation to the empirical 
conditional distribution Pt{-\S,I), the results of F are a dramatic underestimate, 
particularly of the fraction of susceptibles. 

In the production of Figure |6al the empirical pair distribution P{x, y) was used 
for as the basis for F and Pt- If instead the approximations are employed for their 
intended purpose of closing the system of equations ([9]), the pair distribution used 
will evolve over time according to these equations and, as a result, errors may quickly 
become compounded to produce an inaccurate prediction of the progress of the disease. 
In Figure l6bl the time evolution of the fraction of infected sites as averaged over the 
simulations (circles) is shown alongside the predictions resulting from system (|9]) closed 
using approximations F (dashed line) and Pt (solid line). 

As is clearly visible in the figure, the success of equation Pt as an approximation 
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(a) Top - empirical distribution Pt{ ■ /), computed as described in the text, 
Centre - prediction of the same quantity using equation (IT5|) . 
Bottom - prediction of the same quantity using equation (H! 
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(b) Circles - fraction of infected sites on a small- world network as described in the text, 
Solid line - prediction of the same quantity using approximation in system 
Dashed line - prediction of the same quantity using approximation in system © . 
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to the distribution Pt has not translated into a good prediction when used to close 
the system of differential equations. In contrast, the approximation F is relatively 
successful when used in this way, providing a much closer fit to the data from numerical 
simulations. 

That the dramatic failure of F in reconstructing the density Pt does not result in a 
poor prediction of the progress of the disease is surprising and requires explanation. 
Note that both approximation schemes make use of the same estimate Pr for the 
distribution Pr of three sites in a row, which here accounts for around 65% of the 
estimate of P{x,y,z), see equation (fTOj) . Performing a similar numerical experiment to 
assess the accuracy of this shared component reveals that (for this graph and parameter 
choice at least) it provides a substantial overestimate of the conditional density of 
susceptibles Pr^S \S, I). In contrast. Figure [6a] shows that approximation F gives a 
large underestimate of the quantity Pt{S \ S, I), which has the effect of lowering the 
overall estimate of P{S, S, I), acting to cancel out most of the erroneous overestimation 
introduced by Pr. Since the approximation Pt provides a much better estimation of 
Pt{S 15*,/), the error in the shared component Pr is carried over unchecked, resulting 
in a poorer overall performance. 

To summarise, it appears that the success of the approximation scheme proposed 
in [ini [25] may in fact be largely due to a fortunate cancellation caused by two poor 
approximations giving large errors in opposite directions. As the experiments of this 
subsection have demonstrated, improving one part of the scheme actually has the counter 
productive effect of destroying this curious symmetry of errors, resulting in a worse 
performance overall. 



3.3. Sensitivity to errors in approximation 

The central premise of moment-closure methods, including the maximum-entropy 
approach discussed in this article, is that having access to a 'good' approximation scheme 
to higher order correlations will translate into a high accuracy of prediction from a 
system of differential equations employing them. It is therefore sensible to ask: precisely 
how good is 'good'? To put this query in concrete terms, the following numerical 
experiment tests the sensitivity of moment-closure to errors in the approximation. 

Consider the following system of differential equations describing the time evolution 
of the fraction of susceptible, infected and recovered individuals in a standard SIR 
epidemic: 

j^PiS) = -f3P{S)P{I\S) + aP{R) 

J^P{S) = PP{S)P{I\S)-^P{I) (15) 

|P(5) = 7P(/) - aP{R) . 

Here the term P{I\S) denotes the conditional probability that a randomly chosen 
neighbour of a randomly chosen susceptible site is infected, so for example /3P{S)P{I\S) 
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Figure 7: Solid line - fraction of recovered sites measured from a single run of SIR 
dynamics on a toroidal square lattice of size N = 10^. Dashed lines - high and low 
results from numerical integration of differential equation system f|T5l) . with the pair 
distribution modified but within e = 0.001 in KL-divergence from the empirical value. 



gives the probability that a susceptible site is chosen for update and is then infected 
by a neighbour. These equations give an exact description (on average and in the limit 
N — )• oo) of the progress of an epidemic, however they are not closed as they depend on 
the unknown quantity P{I\S). 

Now consider the following numerical experiment. SIR dynamics are performed on 
a large square lattice and at each micro-timestep the empirical conditional probability 
P{I\S) is measured directly from the simulation. Simultaneously with this process, the 
system of differential equations ( !T5|) is numerically integrated using the numerical value 
of P{I\S) taken from the empirical measurement at the same moment in time. Since 
the equations ( IT5l) are exact and P{I\S) is determined from the simulation, the output 
from the simulation and the numerical integration will be almost indistinguishable - 
this fact is easily verified on the computer. 

By repeating this process, but now replacing P{I\S) in the numerical integration by 
a slightly perturbed value, one will gain some insight into the robustness of the system 
f lTSj) to errors in approximation of P{I\S). 

The results of such an experiment are shown in Figure [71 A single run of SIR 
dynamics was performed on a toroidal square lattice of size = 10^, with parameters 
f3 = 0.6, 7 = 0.1 and a = 0. The high and low perturbations of P{I\S) were obtained 
by adjusting this value up or down as far as possible whilst keeping the KuUback Leibler 
divergencqjlJ from the empirical distribution P{-\S) less than e = 0.001. 

For most practical purposes, a KL-divergence of e = 0.001 is extremely low and 
would represent a very close approximation to the true distribution. As demonstrated 
in Figure [TJ however, a such small range of error in the distribution can still result in 

II KL-divergence is a measure (not a metric) of the difference between two probability distributions, 
defined in the discrete case by D{Pi\\P2) — ^i(^) los[^i(^)/^2(a;)]- 
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considerable uncertainty in the resulting prediction of system ( lT5i) . as the error is quickly 
compounded over the course of the epidemic. Even with such a precise approximation 
to P{I\S), the final size of the epidemic may be over- or under-estimated by around 
10%. 

On first glance it may seem unfair to choose extremal errors in P{I\S), however, 
it should be noted that, as we saw in the previous subsection, errors in moment-closure 
approximations may often be systematic rather than random and hence such a choice 
is quite realistic. 

4. Conclusion 

Moment-closure methods are vital tools in the analysis of stochastic systems with 
network structure, allowing one to make mathematical predictions about the behvahiour 
of models previously only accessible to simulation. The success of such methods is 
certainly not in question, however, it is mitigated by several issues: 

(i) the most common moment-closure method (the pair approximation) is inconsistent 
in its treatment of triangles 

(ii) higher order approximation schemes usually involve an arbitrary and unmotivated 
choice of some conditional independence structure 

(iii) moment-closed systems of differential equations (particularly those of higher order) 
are computationally intensive to solve 

(iv) the insertion of an approximation scheme into a system of differential equations 
obfuscates the effects of the accuracy of the approximation. 

The work presented in this article provides an effective solution to the first two issues 
on this list. By following a systematic approach based on maximisation of entropy, one 
is able to obtain approximation schemes of any desired size and complexity, without the 
need for any arbitrary or inconsistent assumptions. 

In those situations in which a conditional independence structure genuinely is 
present, such as the pair approximation of three sites in a row, it is recovered by 
maximum-entropy and simple algebraic equations for the approximation may be derived. 
In more general situations where no such simple equations exist, the maximum-entropy 
approximation may be computed efficiently by a process of iterative replacement. 
Resorting to numerical approaches is certainly an aesthetic failing but it is not a practical 
one as moment-closed systems of differential equations are already computationally 
intensive to solve and the use of the maximum-entropy approximation causes very little 
additional burden. Indeed, numerical computation of the closure terms is all that is 
required for analysis using software packages such as AUTO [32], and so the maximum- 
entropy method may be straightforwardly implemented there. 

To investigate the application of maximum-entropy moment- closure, issue (ii) above 
was considered in some detail. The usual pair approximation to triangles suffers from 
the mathematical drawbacks of neither being properly normalised, nor agreeing with 
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the known marginal distribution on pairs. The maximum-entropy approximation by 
definition does not suffer either of these problems, however, it cannot be written 
down in a neat closed form. As a middle ground, an alternative pair approximation 
to triangles was obtained by halting the iterative replacement algorithm after three 
steps. In numerical experiments based on the SIR model on a small world network, this 
alternative scheme vastly outperformed the usual approximation in the job of predicting 
the joint distribution of three sites forming a triangle. However, when coupled with the 
approximation for three sites in a row and used to close a system of differential equations, 
the results of the new approximation did not agree with those of numerical simulations, 
whilst making the less accurate approximation to triangles somehow resulted in a better 
fit to the data. Further numerical investigation revealed that this unusual effect is due to 
a fortuitous cancellation of error between the pair approximations for rows and triangles, 
whose balance was upset by improving one half of the approximation scheme but not 
the other. 

The phenomenon that an improved approximation scheme can lead to worse results 
when used to close a system of differential equations raises often overlooked questions 
about the reliability and sensitivity of moment-closure methods in general. As the simple 
numerical experiment presented in Section 3.3 shows, an approximation which is highly 
accurate but suffers from a very small systematic bias can yield results with errors in 
prediction which are several orders of magnitude larger. This suggests that predictions 
made by moment-closure methods (whose biases we do not know) should be viewed as 
being somewhat flexible. 

Moving forward, there is considerable scope to apply the methods outlined in 
this article to the study of stochastic systems on networks. It would be particularly 
interesting to take a maximum-entropy approach to the study of moment closure 
methods in adaptive networks [I5l [161 113 HH] ^^^^ stochastic corrections [H]. 
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